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Abstract. Here we present a new fixed parameter tractable algorithm to compute the 
hybridization number r of two rooted binary phylogenetic trees on taxon set X in time 
(6r)' ■ poly(n), where n — \X\. The novelty of this approach is that it avoids the use 
of Maximum Acyclic Agreement Forests (MAAFs) and instead exploits the equivalence 
of the problem with a related problem from the softwired clusters literature. This offers 
an alternative perspective on the underlying combinatorial structure of the hybridization 
number problem. 

1 Introduction 

For notation and background we refer the reader to |2|lj . Let T = {Ti, T2} be a set of two rooted 
binary phylogenetic trees on X, where \X\ = n. We may assume without loss of generality that Ti 
and T2 do not have any non-trivial common subtrees, so 1^1 > 3. Let C = Cl(T) = Cl(Ti)UCl(T2) 
be the set of clusters obtained from T ■ Clearly, every taxon in X appears in at least one cluster 
in C, and \C\ < 4(n — 1), because a binary tree on n taxa contains exactly 2(n — 1) edges. 

Now, we know from [3] that r(C) = h(T) where h(T) is the hybridization number of the two 
trees T\ and T 2 . That is, the minimum number of reticulations required to display the two trees, 
h(T), is equal to the minimum number of reticulations required to represent the union of the 
clusters obtained from the two trees, r(C). Hence we can concentrate on computing r(C). In |2I3| 
it is also proven that there exists a binary network N that represents C such that r(C) = r(N), so 
we can further restrict our attention to binary networks. 

In [T] an algorithm with running time f(r(C)) -poly(n) is given to compute r(C) for an arbitrary 
set of clusters, where f(r(C)) is a function of r(C) that does not depend on n and poly(n) is a 
function of the form n ^. However, the running time of the algorithm in [T] is purely theoretical. 
In the case of clusters obtained from two binary trees the running time can be more heavily 
optimized, which is the motivation for this note. 

2 A fixed parameter tractable algorithm for computing the 
hybridization number of two binary trees 

Given a cluster set C and x,y £ X, we write x — >c V if and only if every non-singleton cluster in 
C containing x, also contains We say that a taxon x G X is a terminal if there does not exist 
x' € X such that x ^ x' and x — >c x'. In [5] it is proven that C X' c X is an ST-set of C if 
and only if X' € Cl(T{) n Cl(T 2 ) and the two subtrees induced by X' in Ti and T 2 are identical. 
Given that T% and T 2 are assumed to have no non-trivial common subtrees it follows that C has 
no non-singleton ST-sets, a property we call ST-collapsed pQ. 

Observation 1. Let C be an ST-collapsed set of clusters on X such that r(C) > 1. Then the 
relation — >c is a partial order on X , the terminals are the maximal elements of the partial order 
and there is at least one terminal. 



1 Note that, if a taxon x appears in only one cluster, {x}, then (vacuously) x^cy for all y 7^ x. 



Proof. The relation — >c is clearly reflexive and transitive. To see that it is anti-symmetric, suppose 
there exist two elements x ^ y £ X such that x^-cU and y-^cx- Then we have that, for every 
non-singleton cluster C € C, CC\{x, y} is either equal to or {x, y} i.e. C is compatible with {x, y}. 
Furthermore, the only clusters that can possibly be in C|{x,j/} are and {x,y} and these 

are all mutually compatible. So {x,y} is an ST-set, contradicting the fact that C is ST-collapsed. 
Hence — >c is a partial order. The fact that the terminals are the maximal elements of the partial 
order then follows immediately from their definition. Finally, it is well known that every partial 
order on a finite set of elements contains at least one maximal element (because otherwise a cycle 
exists which contradicts the anti-symmetry property). □ 

Let T be a phylogenetic tree on X. For a vertex u of T we define X(u) C X to be the set of 
all taxa that can be reached from u by directed paths. For a taxon x £ X we define W T (x), the 
witness set for x in T, as X{u) \ {x}, where u is the parent of x. A critical property of W T (x) is 
that, for any non-singleton cluster C £ Cl(T) that contains x, W T (x) CCPj. 

Observation 2. Let C = Cl(T) be a set of clusters on X, where T — {T\,T2} is a set of two 

binary trees on X with no non-trivial common subtrees, and r(C) > 1. Then for any x £ X the 
following statements are equivalent: (1) x is a terminal of C; (2) there exist incompatible clusters 
C U C 2 £C such that C x n C 2 = {x}; (3) W Tl (x) H W T2 (x) = 0. 

Proof. We first prove that (2) implies (1). For x' C1UC2 it holds that x-/> c x', because x £ C% but 
x' $lC\. For x' £ C\ \ C2 it cannot hold that x— >c%\ because x £ Ci but x 1 C2, and this holds 
symmetrically for x' £ C2 \ C\. Hence a; is a terminal. We now show that (1) implies (3). Suppose 
(3) does not hold. Then there exists some taxon x' £ W Tl (x) fl W T2 {x). So every non-singleton 
cluster in C that contains x also contains x' , irrespective of whether the cluster came from T± or 
T2. But then x— >c%'i so (1) does not hold. Hence (1) implies (3). Finally, we show that (3) implies 
(2). Note that (3) implies that in both T\ and T2 the parent of x is not the root. If this was not 
so, then (wlog) W Tl (x) = X \ {x}, and combining this with the fact that W Tl (x), W T2 (x) ^ 
would contradict (3). Hence W Tl (x) U {x} £ Cl(T x ) and W T2 {x) U {x} £ Cl(T 2 ), from which (2) 
follows. □ 

Observation 3. Let C = Cl(T) be a set of clusters on X, where T = {TjjTa} is a set of two 
binary trees on X with no non-trivial common subtrees, and r(C) > 1. Then, for any two taxa 
x 7^ y £ X, x^cy if and only if y £ W Tl (x) fl W T2 (x). 

Proof. Suppose x—^cy, but (wlog) y g" W Tl (x). The parent of x in T% cannot be the root, because 
then W Tl (x) = X \ {x} which contains y. So there is an edge in T\ whose head is the parent of 
x. Let C £ C be the cluster represented by this edge, then C — {x} U W Tl (x). W Tl (x) is non- 
empty and contains neither x nor y, so C is a non-singleton cluster which contains x but not y. So 
x-fr c y. In the other direction, suppose y £ W Tl (x) D W T2 (x). Let C £ C be a non-singleton cluster 
that contains x. Every non-singleton cluster C £ C is from Cl(Ti) or CliT^), so W Tl (x) C C or 
W T2 (x) C C. In any case it follows that y £ C. □ 

Lemma 1. Let C = Cl{T) be a set of clusters on X, where T = {Ti,T 2 } is a set of two binary 
trees on X with no non-trivial common subtrees, and r(C) > 1. Let x be any taxon in C. If x is 
not a terminal of C then there exists a terminal y such that x-^cV- 

Proof. C is ST-collapsed because T\ and T2 contain no non-trivial common subtrees. Hence, by 
Observation [T] we know that — >q is a partial order on X and the terminals, of which there is at 
least one, are the maximal elements of the partial order. The result then follows immediately from 
the transitivity and anti-symmetry property of partial orders. □ 

Lemma 2. Let C = Cl(T) be a set of clusters on X, where T — {T\,T2\ is a set of two binary 
trees on X with no non-trivial common subtrees, and r(C) > 1. Then there exists x £ X such that 
r(C\{x})<r(C). 



Proof. Consider a binary network N which represents C, where r(N) — r(C). By acyclicity N 
contains at least one Subtree Below a Reticulation (SBR) [2], i.e. a node u with indcgrcc-1 whose 
parent is a reticulation, and such that no reticulation can be reached by a directed path from u. 
Let X' be the set of taxa reachable from u by directed paths. X' is an ST-set, so \X'\ — 1 (because 
C is ST-collapsed). Let x be the single taxon in X' . Deleting x and its reticulation parent from ./V 
(and tidying up the resulting network in the usual fashion) creates a network TV' on X \ {x} with 
r(N') < r(N) that represents C \ {x}. □ 

Lemma 3. Let C = Cl(T) be a set of clusters on X , where T — {TijTa} is a set of two binary 
trees on X with no non-trivial common subtrees, and r(C) > 1. Then for each x £ X it holds that 
r(C)-l<r(C\{x}) <r(C). 

Proof. The second < is immediate because removing a taxon from a cluster set cannot raise the 
reticulation number of the cluster set. The first < holds because in [2] it is shown how, given any 
network N' on X \ {x} that represents C \ {x}, we can extend N' to obtain a network N on X 
that represents C such that r(N) < r(N') + 1. □ 

Recall from [2] the definition of an ST-set tree sequence of a cluster set C. Let (Si,S2, —,S P ) 
be an ST-set tree sequence of C of minimum length, where C — Cl(T) = Cl(T\) U CliT^)- In [2] it 
is proven that p = r(C). 

Lemma 4. Let C = Cl(T) be a set of clusters on X , where T — {Ti,Ta} is a set of two binary 
trees on X with no non-trivial common subtrees, and r(C) > 1. Suppose there exist distinct taxa 
a,b,c € X such that {a, b} and {b, c} are both clusters in C. Then there exists x € {a,b,c\ such 
that r{C \ {x}) = r(C) - 1. 

Proof. We know from [2] that there exists an ST-set tree sequence (Si, S p ) of C where p = r(C). 
It is clear that at least one of {a, b, c} has to occur in one of the Si sets, because otherwise the 
two clusters {a, b} and {b, c} survive even after S p has been removed, contradicting the fact that 
it is a tree sequence. Now, let 1 < i < p be the smallest i such that Si H {a, b, c} ^ 0. Note 
that, at the point just before the ST-set Si is removed, none of {a, b}, {b, c}, {a, c}, {a, b, c} are 
ST-sets (because of the incompatible pair of clusters {a, b} and {b, c}). So \Si (1 {a, b, c}\ < 3. 
Furthermore, \Si n {a, b, c}\ ^ 2 because at least one of the two clusters {a, b} and {b, c} will be 
incompatible with S{. So \Si\ = 1 and Si C {a,b,c}. Let x be the single taxon in S%. This means 
that (Si, Si-i,Si+i, S p ) is an ST-set tree sequence of C \ {x} of length p — 1. We conclude 
that r(C \ {x}) < r(C) and (by Lemrnag we have that r(C \ {x}) = r(C) - 1. □ 

Theorem 1. Let C = Cl(T) be a set of clusters on X, where T = {Ti,T2} is a set of two binary 
trees on X with no non-trivial common subtrees, and r(C) > 1. Then at least one of the following 
two conditions holds: (1) there exist distinct taxa a,b,c € X such that {a, b} and {b, c} are both 
clusters in C; (2) there exists a taxon x G X such that r(C \ {x}) = r(C) — 1 and x is either a 
terminal of C or is in some size-2 cluster with a terminal of C. 

Proof. Towards a counter-example, assume that the claim does not hold for C. For each x E X 
that is not a terminal, let M(x) be an arbitrary terminal such that HcMfi), which must exist 
by Lemma [T] The mapping M will remain unchanged for the rest of the proof. 

Let N be an arbitrary binary network that represents C such that r(N) = r(C). Due to the 
fact that C is ST-collapsed, every SBR consists of exactly one taxon, so we can unambiguously 
identify each SBR by its corresponding taxon. Let R(N) C X be the set of SBRs of N. Clearly, if 
R(N) contains a terminal we are done (by the same argument as used in the proof of Lemma [2b. 

For x € R(N), we define the detach and re-hang above a terminal (DRHT) operation as follows. 
We first delete x and tidy up the resulting network in the usual fashion, which causes the parent 
reticulation of x and its reticulation edges to disappear. This creates a network N' on X \ {x} 
that represents C \ {x}, where r(N') — r(N) — 1 (by Lemma [3| [^] Let 8 be the root of AT' and 

2 Note that Lemma [3] also prevents that, prior to the tidying-up phase, a multi-edge is created, because 
then both the parent reticulation of x and the reticulation at the end of the multi-edge would disappear 
in the tidying-up phase, meaning that r(N') < r(N) — 2. 



p be the parent of M(x) in N'. We construct a new network N" from N' by deleting the edge 
(p, M{x)), introducing new nodes p', r, r', adding the edges (p,p ; ), (p',M(x)), (p',r), (S,r), (r,r') 
and finally labelling r' with x. This potentially raises the outdcgrcc of the root above 2 (i.e. makes 
the network non-binary) but this could easily be addressed by replacing the root with a chain of 
nodes of indegree at most 1 and outdegree 2 (see e.g. [3]); the exposition is easier to follow if we 
permit a high-degree root. As observed in [3j N" represents C. To summarise the argument from 
[2J, let C £ C be a non-singleton cluster such that x $ C. Clearly C is represented by N' . The edge 
in N' that represents C will still represent C in N" if we switch the reticulation edge (p',r) off 
and the reticulation edge (6, r) on. So suppose C is a non-singleton cluster that does contain x. 
In this case M(x) £ C \ {x}. So the edge in N' that represents C \ {x} can represent C in N" by 
switching the reticulation edge (p',r) on and the reticulation edge (S,r) off. Hence N" represents 
all clusters C £C and r{N") = r(N). Note that, in general, R{N") ^ R(N). 

We will repeatedly apply the DRHT operation to transform N into a network with a "canon- 
ical" form. Specifically, choose an arbitrary x £ R{N) and let R° = {x}. Let N° = N and let 
N 1 be the network obtained by applying the DRHT operation to x in N. We apply the following 
procedure, starting with i = 0: 

(1) If R(N l+1 ) contains a terminal then stop. 

(2) If R(N i+1 ) \ R l = then stop. 

(3) Otherwise, let y be an arbitrary taxon in R(N i+1 ) \ R\ let R l+1 = R i U {y} and let N l+2 be 
obtained by applying the DRHT operation to y in N' t+1 . Increment i and go back to (1). 

The procedure will definitely stop because with each new iteration > \R l \. Let N* be 

the final network obtained. Note that if the procedure stops at line (1) the proof is complete: N* 
is a network that represents C with r(C) reticulations that has a terminal as a SBR. So we can 
assume that the procedure stops at line (2). Clearly, there are no terminals in R(N*). Furthermore, 
for each x £ R(N*) we know that at some iteration a DRHT operation was performed on x, be- 
cause otherwise the procedure could continue for at least one more iteration. In the iteration when 
this happens, the tail of one reticulation edge (of the parent reticulation of x) is attached to the 
root, and the other "just above" M[x) i.e. at p' . In subsequent iterations M(x) will never undergo 
a DRHT operation, because it is a terminal. Also, x is not a terminal (and thus is not in the range 
of M) so the edge between x and its parent reticulation will never be subdivided by a DRHT oper- 
ation. Furthermore, both reticulation edges will remain intact because, after the tidying-up phase, 
the DRHT operation only subdivides edges whose head is a taxon. In fact, the only relevant change 
that can happen is that the edge (p' , M(x)) is subdivided by later DRHT operations; specifically, 
DRHT operations applied to some non-terminal t/^i such that M(x) — M(y). Whether or not 
this happens, it follows that in N* for every x £ R(N*) one parent of the (parent reticulation of) 
x is the root, and the other parent is a node t(x) such that a directed tree-path (i.e. a path that 
contains no reticulations) exists from t(x) to M(x). 

We will continue to focus on N* . We say that a directed path is a root-reticulation path if it 
starts at the root and terminates at a reticulation. The reticulation length of such a path is the 
number of reticulations in it (including the end node). Observe that the last node r on a root- 
reticulation path of maximum length must be (the parent reticulation of) an SBR. If this was not 
so then a previously unvisitcd reticulation r* ^ r is reachable by a directed path from r, thus 
contradicting the assumption that the root-reticulation path had maximum reticulation length. 

Consider an arbitrary x' £ R(N* ) corresponding to the SBR at the end of a root- reticulation 
path of maximum reticulation length. Let x be the taxon in R(N*) such that M(x) = M(x') and, 
amongst all such taxa, most recently underwent a DRHT operation; it might be that x — x' . Note 
that, by construction, x is also at the end of a root-reticulation path of maximum reticulation 
length. Furthermore, one parent of (the parent reticulation of) x is the root, and the other is a 
node t(x) such that (t(x), M(x)) is an edge in N* . We are now ready for the core argument in the 
proof. We walk backwards from t(x) towards the root until we encounter a vertex u for which one 
of the following three mutually exclusive cases holds: (a) u is a reticulation; (b) u is a tree-node 



(i.e. a node that is not a reticulation) from which some taxon y £ X \ M(x) can be reached by a 
directed tree-path; (c) u is the root and (b) does not hold. 

Before commencing with the case analysis we argue that N* has a specific form. Let t be an 
intermediate node on the directed path from u to t[x). We know t is a tree-node from which no 
taxon (other than M(x)) can be reached by a directed tree-path. So all maximal directed tree- 
paths starting at t that do not terminate at M(x), must terminate at the parent of a reticulation 
r. But then there exists a root-reticulation path of maximum reticulation length terminating at r, 
so r is actually the parent of an SBR. Let x' be the taxon corresponding to this SBR. We know 
(by construction of N*) that the non-root parent of r can reach M(x') by a directed tree-path. If 
M(x') 7^ M(x) then (b) would actually have held for node t, because merging the two directed 
tree-paths would create a directed tree-path from t to M(x'), and the backwards walk would have 
terminated earlier. So M(x) — M(x') and hence (t, r) is an edge in N* . From this we can conclude 
that, for each intermediate node t, the child of t that does not lie on the path from u to t(x), is (the 
parent reticulation of) an SBR, and moreover all such SBRs map to M(x). We now commence 
the case analysis. 

Case (a). In this case M(x) is the only taxon reachable by directed tree-paths from the child of 
reticulation u. This is depicted in Figure[l] Consider the network N** obtained from N* by delet- 
ing M{x) and suppressing its parent; in particular, consider how this changes Figure [l] Clearly, 
N** represents C \ {M(x)}. Now, there is some ST-set tree sequence of C \ {M(x)} that begins 
({x} 7 {x'}, . . . , {x"}, because singletons are always ST-sets. Each time one of these ST-sets is re- 
moved from N** the reticulation number of N** drops by exactly one, except at the point when 
{x"} is removed, because at this point the reticulation u will also disappear, causing the reticula- 
tion number of the network to drop by two. Hence we can conclude that C \ {M(x)} has an ST-set 
tree sequence of length r(N*) — 1 = r(C) — 1, and we are done. 

Case (b). Let y ^ M(x) be the taxon that can be reached by a directed tree-path from u. 
This is depicted in Figure [2j M(x) is a terminal, so there must exist some non-singleton cluster 
CeC such that M(x) is in C, but y is not in C. Critically, the only edges that can represent such 
a cluster lie on the path from u to t(x). Hence there exists R' C R(N*) such that C = R' U M(x) 
and for all x' £ R' , M(x') — M(x). So for each x' £ R' we know that x' — >qM(x). Hence in both T\ 
and T2 the parent of M(x) must be reachable by a directed path from the parent of x' (possibly of 
length 0). Suppose there exist y' , z' £ X such that y' £ W Tl (M(x)), z' £ W T<1 (M{x)) and neither 
y' nor z' is in R' . But then every non-singleton cluster in C that contains M(x), contains either y' 
or z' . Hence C ^ C, which is obviously not possible. So there must be some element x" of R! that 
appears in (wlog) W Tl (M (x)). But there must also be a directed path from the parent of x" in T\ 
to the parent of Mix) in T\. So x" must be a sibling of M(x), i.e. {x", M(x)} £ C. Furthermore, 
x" is an SBR, and M[x) is a terminal, so we are done. 

Case (c). In this case the network must look like Figure |3j because the maximum reticulation 
length of a root- reticulation path is 1. M(x) is in at least one non-singleton cluster C (otherwise 
it would not be a terminal) so there again exists R' C R(N* ) such that C = R' U M (x) and for 
all x' £ i?', M(x') = M(x). (In this case R(N*) = X \ {M(x)}). The rest of the analysis is the 
same as case (b). □ 

Lemma 5. Let C be an ST-collapsed set of clusters on X such that r(C) > 1. Then C contains at 
most 3 • r(C) terminals. 

Proof. Given that C is ST-collapsed, we know that there exists a binary network N with r(N) = 
r(C) that represents C such that N can be obtained by performing the leaf-hanging operation to 
some r(C)-reticulation generator pp. The 1-reticulation and 2-reticulation generators are shown 
in Figure |4j Recall that the edges of a generator are called edge sides and that the nodes of a 
generator with indegree-2 and outdegree-0 are called node sides. For a generator G we let 1(G) be 
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Fig. 1. This is case (a) in the proof of Theorem [l] Here u is a reticulation and the only taxon 
reachable by a directed tree-path from the child of u is M(x). Each intermediate node on the path 
from u to M(x) is the tail of a reticulation edge that feeds into (the parent reticulation of) an 
SBR; the other reticulation edge is attached to the root. All these SBRs such that 

M(x) = M(x') = ... = MO"). 
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Fig. 2. This is case (b) in the proof of Theorem [I] Here u is a tree- node and there is a taxon 
y =/= M(x) reachable by a directed tree-path from u. Each intermediate node on the path from 
u to M(x) is the tail of a reticulation edge that feeds into (the parent reticulation of) an SBR; 
the other reticulation edge is attached to the root. All these SBRs such that 

M(x) = Mix') = ... = M{x"). 
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Fig. 3. This is case (c) in the proof of Theorem [l] Here u is the root and X — {x, x' , . . . ,x"} U 
{M(x)}. Each intermediate node on the path from u to M(x) is the tail of a reticulation edge 
that feeds into (the parent reticulation of) an SBR; the other reticulation edge is attached to the 
root. All these SBRs x, x',...,x" are such that M(x) = M{x') = ... = M(x"). 

any maximum size subset of edge sides such that, for every two sides s ^ s' in 1(G), there is no 
directed path from the head of s to the tail of s' such that all nodes on the path (including the 
head of s and the tail of s') are tree-nodes. We let R(G) be the set of node sides of G, and we 
define t(G) as |i?(G)| + |/(G)|. We define t(r) (where r > 1) as the maximum value of t(G) ranging 
over all r-reticulation generators G. Observe that, if r(C) = r, then t(r) is an upper bound on the 
number of terminals in C. This follows because there can be at most one terminal per edge side 
and, more generally, it is not possible to place two terminals x ^ y on the sides of the generator 
such that a directed tree-path exists from the parent of x to the parent of y, because then x^cU- 
To prove the lemma we will show that t(r) < 3r for r > 1. 

We will prove this by induction. The base case r = 1 is straightforward. There can be 
at most three terminals placed on the 1-reticulation generator: one on the node side and one 
on each of the two edges whose head is the node side, see Figure [4] (The cluster set C = 
{{a, b}, {b, c}, {d, c}, {e, d}}, for which r(C) = 1, shows that three terminals is actually possible). 

Observe that, by acyclicity, every generator has at least one node side. Furthermore, if we (1) 
delete a node side s from an r-reticulation generator, (2) delete all leaves that are created (nodes 
with indegree-1 and outdegree-O) and (3) suppress all nodes with indegree and outdegree both 
equal to 1, we obtain an (r — l)-reticulation generator. For example, observe how applying steps 
(l)-(3) to any 2-reticulation generator creates the unique 1-reticulation generator. 

Now, for the sake of contradiction assume that r > 1 is the smallest value such that t(r) > 3r. 
Let G be an r-reticulation generator such that t(G) > 3r + 1. Let 1(G) be a subset of the edge 
sides, as denned above, such that t(G) = \R(G)\ + \I(G)\. Locate an arbitrary node side s of G. 
We will show that deleting s and applying steps (l)-(3) as described above will create an (r — 1)- 
generator G' such that t(G') > t(G) — 3, yielding a contradiction on the assumed minimality of r. 
There are several cases, conditional on the positions of the tails of the two edges that enter s. 

In case (i) the two tails are distinct and both have indegree-2 and outdegree-1. In this case, 
by the maximality of 1(G), both the edges entering s will be in 1(G). Hence, |/(G')| = |/(G)| — 2 
and \R(G')\ = \R(G)\ + 1, so i(G') = t(G) - 1. 

In case (ii) the two tails are distinct and both have indegree-1 and outdegree-2. Clearly |i?.(G')| = 
|i?(G)| — 1. Let u be the tail of the first edge that enters s. Let p(u) be the parent of u and c(u) the 
child of u not equal to s. The critical observation is that at most one of (u, c(u)) and (p(u), u) is 



in 1(G). So deleting s will delete the edge (u, s) from 1(G), if it is there, and if one of (u, c(u)) and 
(p(u),u) is in 1(G) then this can be deleted and replaced by the new edge (p(u),c(u)). The same 
analysis holds for the second edge (v, s) entering s. Hence |/(G')I — 1-^(^)1 — 2, and this completes 
this case. Note that the analysis still holds if (wlog) p(v) — u, because then at most one of the 
three edges (p(u),u), (u,v), (v,c(v)) will be in 1(G), and if this occurs this edge can be replaced 
by the new edge (p(u),c(v)). 

In case (iii) both tails are distinct, one tail has indegree-2 and outdegree-1 and the other 
tail has indegree-1 and outdegree-2. Then, by combining the insights from the first two cases, 
\R(G')\ = \R(G)\ and \I(G')\ > \I(G)\ - 2. 

Finally, in case (iv) the two tails are the same vertex u i.e. s is the head of a multi-edge. 
Note that at most two of the three edges (p(u), u), (u, s), (u, s) can be in 1(G). Now, suppose p(u) 
has indegrec-2 and outdegree-1. Then \R(G')\ = \R(G)\ and \I(G')\ > \I(G)\ - 2. In the case that 
p(u) has indegree-1 and outdegree-2 we have that |i?(G")| = |-R(G)| — 1, and let p' be the parent 
of p(u) and c' be the child of p(u) not equal to u. Again, at most one of the two edges (p',p(u)) 
and (p(u),c') will be in 1(G), and if necessary this can be replaced by the new edge (p',c r ). So 
|J(G')|>|/(G)|-2. ' □ 
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Fig. 4. The single 1-reticulation generator and the seven 2-reticulation generators. 



Lemma 6. Let C = Cl(T) be a set of clusters on X , where T = {Xi,T2} is a set of two binary 
trees on X with no non-trivial common subtrees, and r(C) > 1. Then there exists X' C X such 
that (1) \X'\ < 6 • r(C) and (2) there exists x e X' such that r(C \ {x}) = r(C) - 1. Furthermore 
such a set X' can be computed in polynomial time. 



Proof. If situation (1) from the statement of Theorem [I] holds then we can simply take X' = 
{a, b, c}, so \X'\ = 3 < 6 • r(C). Otherwise we arc in situation (2) and we take X' to be the union 
of all terminals of C plus all taxa that appear in some size-2 cluster of C with some terminal. From 
Lemma [5] we know that there are at most 3 • r(C) terminals in C. Observe that each such terminal 
can be in at most one size-2 cluster, because otherwise there would exist two incompatible size-2 
clusters i.e. situation (1) of Theorem [I] would hold. Hence there can be at most 3 • r(C) non- 
terminals that appear in size-2 clusters with terminals, from which \X'\ < 6 • r(C) follows. Given 
the characterisation described in Observation [2j and the fact that \C\ < A(n — 1), it is easy to see 
that the set X' can be computed in (low-order) polynomial time. □ 

Theorem 2. Let C = Cl(T) be a set of clusters on X , where T = {Ti,T2} is a set of two binary 
trees on X. Then r(C) = h(T) can be computed in time (6r) r ■ poly(n) time, where r — r(C) and 
n = \X\. 

Proof. The algorithm is simple. We repeat the following steps until a compatible cluster set (i.e. 
a set of clusters that can be represented by a tree) is created: (1) collapse any common subtrees 
into single taxa (adjusting C as necessary), this can be done in (low-order) polynomial time, (2) 
construct the set X' described in Lemma [6] and then (3) "guess" an element x € X' such that 
r(C\{x}) = r(C)-l. 

At each iteration the guessing can be simulated simply by trying all (at most) 6r elements in 
X' . (Note that the X' sets that arise will never have more than 6r elements because removing taxa 
from a cluster set cannot raise the reticulation number of the cluster set). If we traverse this search 
tree in breadth-first fashion and stop as soon as we have created a compatible cluster set then the 
depth of the search tree will equal r = r(C), requiring at most (9((6r) r ) guesses in total. □ 

3 Conclusion 

We have presented a new fixed parameter tractable algorithm for computing the hybridization 
number of two binary phylogenetic trees. The algorithm is unusual in the sense that it attacks the 
problem indirectly: it works within the softwired clusters model (which does not require the full 
topology of the input trees to be preserved) and links the optima together using the unification 
results in [3)2] . We hope that this will stimulate new insights into the underlying combinatorial 
structure of the hybridization number problem. 
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